Cooperative Inverse Reinforcement Learning

نویسندگان

  • Dylan Hadfield-Menell
  • Stuart J. Russell
  • Pieter Abbeel
  • Anca D. Dragan
چکیده

For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partialinformation game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum Likelihood Inverse Reinforcement Learning

OF THE DISSERTATION MAXIMUM LIKELIHOOD INVERSE REINFORCEMENT LEARNING

متن کامل

Preference elicitation and inverse reinforcement learning

We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relati...

متن کامل

Inverse Reinforcement Learning Under Noisy Observations (Extended Abstract)

We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, noisy observations of the trajectory are available. We generalize the previous method of expectation-maximization for inverse reinforcement learning, which allows the trajectory of the expert to be partially hidden from the learner, to incorpo...

متن کامل

Reinforcement Learning in Cooperative Multi–Agent Systems

Reinforcement Learning is used in cooperative multi–agent systems differently for various problems. We provide a review on learning algorithms used for repeated common–payoff games, and stochastic general– sum games. Then these learning algorithms is compared with another algorithm for the credit assignment problem that attempts to correctly assign agents the awards that they deserve.

متن کامل

Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing

In this paper, we apply reinforcement learning for automatically learning cooperative persuasive dialogue system policies using framing, the use of emotionally charged statements common in persuasive dialogue between humans. In order to apply reinforcement learning, we describe a method to construct user simulators and reward functions specifically tailored to persuasive dialogue based on a cor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016